implicit network
A Generalization Bound for a Family of Implicit Networks
Fung, Samy Wu, Berkels, Benjamin
Implicit networks are a class of neural networks whose outputs are defined by the fixed point of a parameterized operator. They have enjoyed success in many applications including natural language processing, image processing, and numerous other applications. While they have found abundant empirical success, theoretical work on its generalization is still under-explored. In this work, we consider a large family of implicit networks defined parameterized contractive fixed point operators. We show a generalization bound for this class based on a covering number argument for the Rademacher complexity of these architectures.
Implicit Neural Representations for Speed-of-Sound Estimation in Ultrasound
Byra, Michal, Jarosik, Piotr, Karwat, Piotr, Klimonda, Ziemowit, Lewandowski, Marcin
Accurate estimation of the speed-of-sound (SoS) is important for ultrasound (US) image reconstruction techniques and tissue characterization. Various approaches have been proposed to calculate SoS, ranging from tomography-inspired algorithms like CUTE to convolutional networks, and more recently, physics-informed optimization frameworks based on differentiable beamforming. In this work, we utilize implicit neural representations (INRs) for SoS estimation in US. INRs are a type of neural network architecture that encodes continuous functions, such as images or physical quantities, through the weights of a network. Implicit networks may overcome the current limitations of SoS estimation techniques, which mainly arise from the use of non-adaptable and oversimplified physical models of tissue. Moreover, convolutional networks for SoS estimation, usually trained using simulated data, often fail when applied to real tissues due to out-of-distribution and data-shift issues. In contrast, implicit networks do not require extensive training datasets since each implicit network is optimized for an individual data case. This adaptability makes them suitable for processing US data collected from varied tissues and across different imaging protocols. We evaluated the proposed SoS estimation method based on INRs using data collected from a tissue-mimicking phantom containing four cylindrical inclusions, with SoS values ranging from 1480 m/s to 1600 m/s. The inclusions were immersed in a material with an SoS value of 1540 m/s. In experiments, the proposed method achieved strong performance, clearly demonstrating the usefulness of implicit networks for quantitative US applications.
Training Implicit Networks for Image Deblurring using Jacobian-Free Backpropagation
Liu, Linghai, Tong, Shuaicheng, Zhao, Lisa
Recent efforts in applying implicit networks to solve inverse problems in imaging have achieved competitive or even superior results when compared to feedforward networks. These implicit networks only require constant memory during backpropagation, regardless of the number of layers. However, they are not necessarily easy to train. Gradient calculations are computationally expensive because they require backpropagating through a fixed point. In particular, this process requires solving a large linear system whose size is determined by the number of features in the fixed point iteration. This paper explores a recently proposed method, Jacobian-free Backpropagation (JFB), a backpropagation scheme that circumvents such calculation, in the context of image deblurring problems. Our results show that JFB is comparable against fine-tuned optimization schemes, state-of-the-art (SOTA) feedforward networks, and existing implicit networks at a reduced computational cost.
On the optimization and generalization of overparameterized implicit neural networks
Implicit neural networks have become increasingly attractive in the machine learning community since they can achieve competitive performance but use much less computational resources. Recently, a line of theoretical works established the global convergences for first-order methods such as gradient descent if the implicit networks are over-parameterized. However, as they train all layers together, their analyses are equivalent to only studying the evolution of the output layer. It is unclear how the implicit layer contributes to the training. Thus, in this paper, we restrict ourselves to only training the implicit layer. We show that global convergence is guaranteed, even if only the implicit layer is trained. On the other hand, the theoretical understanding of when and how the training performance of an implicit neural network can be generalized to unseen data is still under-explored. Although this problem has been studied in standard feed-forward networks, the case of implicit neural networks is still intriguing since implicit networks theoretically have infinitely many layers. Therefore, this paper investigates the generalization error for implicit neural networks. Specifically, we study the generalization of an implicit network activated by the ReLU function over random initialization. We provide a generalization bound that is initialization sensitive. As a result, we show that gradient flow with proper random initialization can train a sufficient over-parameterized implicit network to achieve arbitrarily small generalization errors.
Research Papers based on Infinite-Width Networks(Artificial Intelligence)
Abstract: Substantial work indicates that the dynamics of neural networks (NNs) is closely related to their initialization of parameters. Inspired by the phase diagram for two-layer ReLU NNs with infinite width (Luo et al., 2021), we make a step towards drawing a phase diagram for three-layer ReLU NNs with infinite width. First, we derive a normalized gradient flow for three-layer ReLU NNs and obtain two key independent quantities to distinguish different dynamical regimes for common initialization methods. With carefully designed experiments and a large computation cost, for both synthetic datasets and real datasets, we find that the dynamics of each layer also could be divided into a linear regime and a condensed regime, separated by a critical regime. The criteria is the relative change of input weights (the input weight of a hidden neuron consists of the weight from its input layer to the hidden neuron and its bias term) as the width approaches infinity during the training, which tends to 0, and O(1), respectively.
Gradient Descent Optimizes Infinite-Depth ReLU Implicit Networks with Linear Widths
Implicit deep learning has recently become popular in the machine learning community since these implicit models can achieve competitive performance with state-of-the-art deep networks while using significantly less memory and computational resources. However, our theoretical understanding of when and how first-order methods such as gradient descent (GD) converge on \textit{nonlinear} implicit networks is limited. Although this type of problem has been studied in standard feed-forward networks, the case of implicit models is still intriguing because implicit networks have \textit{infinitely} many layers. The corresponding equilibrium equation probably admits no or multiple solutions during training. This paper studies the convergence of both gradient flow (GF) and gradient descent for nonlinear ReLU activated implicit networks. To deal with the well-posedness problem, we introduce a fixed scalar to scale the weight matrix of the implicit layer and show that there exists a small enough scaling constant, keeping the equilibrium equation well-posed throughout training. As a result, we prove that both GF and GD converge to a global minimum at a linear rate if the width $m$ of the implicit network is \textit{linear} in the sample size $N$, i.e., $m=\Omega(N)$.
Finding Influential Authors in Brand-Page Communities
Purohit, Hemant (Wright State University) | Ajmera, Jitendra (IBM Research, New Delhi) | Joshi, Sachindra (IBM Research, New Delhi) | Verma, Ashish (IBM Research, New Delhi) | Sheth, Amit (Wright State University)
Enterprises are increasingly using social media forums to engage with their customer online- a phenomenon known as Social Customer Relation Management (Social CRM) . In this context, it is important for an enterprise to identify “influential authors” and engage with them on a priority basis. We present a study towards finding influential authors on Twitter forums where an implicit network based on user interactions is created and analyzed. Furthermore, author profile features and user interaction features are combined in a decision tree classification model for finding influential authors. A novel objective evaluation criterion is used for evaluating various features and modeling techniques. We compare our methods with other approaches that use either only the formal connections or only the author profile features and show a significant improvement in the classification accuracy over these baselines as well as over using Klout score.